Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix pyarrow and numpy logical bug concerning bool and string #60529

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ldlin1
Copy link

@ldlin1 ldlin1 commented Dec 9, 2024

Using logical operators (e.g., |, &) on non-boolean data, where this data should be cast to bool, works for most types (e.g., float, strings). However, these operations fail with pyarrow-backed strings and numpy-backed strings.

This PR fixes the issues with pyarrow-backed string arrays by casting them into boolean arrays when they are used with logical operators. The newly implemented helper functions convert_string_to_boolean_array and cast_for_logical perform the casting, while the ARROW_LOGICAL_FUNCS dictionary has been modified to use these helper functions in the process of performing logical operations (see pandas/core/arrays/arrow/array.py).

This PR fixes the issues with numpy-backed string arrays by casting them into boolean arrays whenever they are used with boolean arrays in logical operations. This is done in the logical_op function (see pandas/core/ops/array_ops.py).

@ldlin1 ldlin1 changed the title BUG: Fix pyarrow logical bug concerning bool and string BUG: Fix pyarrow and numpy logical bug concerning bool and string Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG (string dtype): logical operation with bool and string failing
2 participants